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"If stock market experts were so expert, they 
would be buying stock, not selling advice. " 
- Norman Augustine, US aircraft business- 
man (1935 - ) 



I. INTRODUCTION 

The word "correlation" is defined as "a relation 
existing between phenomena or things or between 
mathematical or statistical variables which tend to 
vary, be associated, or occur together in a way 
not expected on the basis of chance alone" (see 
http://www.m-w.com/dictionary/correlations). As soon 
as we talk about "chance", the words "probabil- 
ity" , "random" , etc come to our mind. So, when we talk 



about correlations in stock prices, what we are really 
interested in are the nature of the time series of stock 
prices, the relation of stock prices with other variables 
like stock transaction volumes, the statistical distribu- 
tions and laws which govern the price time series, in par- 
ticular whether the time series is random or not. The 
first formal efforts in this direction were those of Louis 
Bachelier, more than a century ago Eversince, finan- 
cial time series analysis is of prevalent interest to the- 
oreticians for making inferences and predictions though 
it is primarily an empirical discipline. The uncertainty 
in the financial time series and its theory makes it spe- 
cially interesting to statistical physicists, besides financial 
economists 0, ■ One of the most debatable issues in fi- 
nancial economics is whether the market is "efficient" or 
not. The "efficient" asset market is one in which the in- 
formation contained in past prices is instantly, fully and 
continually reflected in the asset's current price. As a 
consequence, the more efficient the market is, the more 
random is the sequence of price changes generated by 
the market. Hence, the most efficient market is one in 
which the price changes are completely random and un- 
predictable. This leads to another relevant or pertinent 
question of financial econometrics: whether asset prices 
are predictable. Two simplest models of probability the- 



ory and financial econometrics that deal with predicting 
future price changes, the random walk theory and Mar- 
tingale theory, assume that the future price changes are 
functions of only the past price changes. Now, in Eco- 
nomics the "logarithmic returns" is calculated using the 
formula 



r(t) = lnP(i) -lnP(i- 1), 



(1) 
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where P(t) is the price (index) at time step t. A main 
characteristic of the random walk and Martingale models 
is that the returns are uncorrelated. 

In the past, several hypotheses have been proposed to 
model financial time series and studies have been con- 
ducted to explain their most characteristic features. The 
study of long-time correlations in the financial time series 
is a very interesting and widely studied problem, espe- 
cially since they give a deep insight about the underlying 
processes that generate the time series Q. The complex 
nature of financial time series (see Fig. ^) has especially 
forced the physicists to add this system to their existing 
list of dynamical systems that they study. Here, we will 
not try to review all the studies, but instead give a brief 
outlook of the studies done by the author and his collab- 
orators, and the motivated readers are kindly asked to 
refer the original papers for further details. 



II. ANALYSING CORRELATIONS IN STOCK 
PRICE TIME SERIES 

A. Financial Correlation matrix and constructing 
Asset Trees 

In our studies, we used two different sets of finan- 
cial data for different purposes. The first set from the 
Standard & Poor's 500 index (S&P500) of the New York 
Stock Exchange (NYSE) from July 2, 1962 to December 
31, 1997 containing 8939 daily closing values, which we 
have already plotted in Fig.^d). In the second set, we 
study the split-adjusted daily closure prices for a total of 
N = 477 stocks traded at the New York Stock Exchange 
(NYSE) over the period of 20 years, from 02-Jan-1980 to 
31-Dec-1999. This amounts a total of 5056 price quotes 
per stock, indexed by time variable r = 1,2,..., 5056. 
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FIG. 1: Comparison of several time series which are of in- 
terest to physicists and economists: (a) Random time series 
(3000 time steps) using random numbers from a Normal dis- 
tribution with zero mean and unit standard deviation, (b) 
Multivariate spatio-temporal time series (3000 time steps) 
drawn from the class of diffusively coupled map lattices in 
one-dimension with sites i — l,2...n' of the form: yl+i = 
(l-e)/(yi)+f(/(l4 +1 )+/(j/*- 1 )), where f(y) = 1-ay 2 is the 
logistic map whose dynamics is controlled by the parameter a, 
and the parameter e is a measure of coupling between nearest- 
neighbor lattice sites. We use parameters a = 1.97, e = 0.4 
for the dynamics to be in the regime of spatio-temporal chaos. 
We choose n = 500 and iterate, starting from random initial 
conditions, for p = 5 x 10 7 time steps, after discarding 10° 
transient iterates. Also, we choose periodic boundary condi- 
tions, x(n + 1) = x(l). (c) Multiplicative stochastic process 
GARCH(1,1) for a random variable xt with zero mean and 
variance of, characterized by a Gaussian conditional proba- 
bility distribution function ft(x): of = ceo + aiaf_ 1 + y3iof_ 1 , 
using parameters «o = 0.00023, cm = 0.09 and f3i = 0.01 
(3000 time steps), (d) Empirical Return time series of the 
S&P500 stock index (8938 time steps). 



For analysis and smoothing purposes, the data is divided 
time- wise into M windows t = 1, 2,..., M of width T, 
where T corresponds to the number of daily returns in- 
cluded in the window. Several consecutive windows over- 
lap with each other, the extent of which is dictated by 
the window step length parameter 8T, which describes 
the displacement of the window and is also measured in 
trading days. The choice of window width is a trade-off 
between too noisy and too smoothed data for small and 
large window widths, respectively. The results presented 
in this paper were calculated from monthly stepped four- 
year windows, i.e. ST = 250/12 w 21 days and T = 1000 
days. We have explored a large scale of different val- 
ues for both parameters, and the cited values were found 
optimal @. With these choices, the overall number of 
windows is M — 195. 

In order to investigate correlations between stocks we 
first denote the closure price of stock i at time r by Pi (r) 
(Note that r refers to a date, not a time window). We 
focus our attention to the logarithmic return of stock 



i, given by rj(r) = In Pi(r) — In Pj(r — 1) which for a 
sequence of consecutive trading days, i.e. those encom- 
passing the given window t, form the return vector r\. 
In order to characterize the synchronous time evolution 
of assets, we use the equal time correlation coefficients 
between assets i and j denned as 



(rM)-<r|>(H> 



i{rf) 



(2) 



where (...) indicates a time average over the consecutive 
trading days included in the return vectors. These cor- 
relation coefficients fulfill the condition — 1 < pij < 1. 
If = 1, the stock price changes are completely corre- 
lated; if = 0, the stock price changes are uncorrelated 
and if py = — 1, then the stock price changes are com- 
pletely anti-correlated 0. These correlation coefficients 
form an TV x TV correlation matrix C*, which serves as 
the basis for trees discussed in this paper. 

We construct an asset tree according to the method- 
ology by Mantegna Q. For the purpose of constructing 
asset trees, we define a distance between a pair of stocks. 
This distance is associated with the edge connecting the 
stocks and it is expected to reflect the level at which the 
stocks are correlated. We use a simple non-linear trans- 
formation d\j = y2(l — p\j) to obtain distances with 

the property 2 > dij > 0, forming an TV x TV symmet- 
ric distance matrix D*. So, if dij — 0, the stock price 
changes are completely correlated; if dij = 2, the stock 
price changes are completely anti-uncorrelated. The trees 
for different time windows are not independent of each 
other, but form a series through time. Consequently, this 
multitude of trees is interpreted as a sequence of evolu- 
tionary steps of a single dynamic asset tree. We also 
require an additional hypothesis about the topology of 
the metric space, the ultrametricity hypothesis. In prac- 
tice, it leads to determining the minimum spanning tree 
(MST) of the distances, denoted T*. The spanning tree 
is a simply connected acyclic (no cycles) graph that con- 
nects all TV nodes (stocks) with TV — 1 edges such that 
the sum of all edge weights, J2d t pt* dj-i> ^ s minimum. 
We refer to the minimum spanning tree at time t by the 
notation T* = (V^E*), where V is a set of vertices and 
E l is a corresponding set of unordered pairs of vertices, 
or edges. Since the spanning tree criterion requires all TV 
nodes to be always present, the set of vertices V is time 
independent, which is why the time superscript has been 
dropped from notation. The set of edges E l , however, 
does depend on time, as it is expected that edge lengths 
in the matrix D* evolve over time, and thus different 
edges get selected in the tree at different times. 



Market characterization 



We plot the distribution of (i) distance elements d\j 
contained in the distance matrix D* (Fig. |2J, (ii) distance 
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1. Tree occupation and central vertex 
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FIG. 2: Distribution of all N(N — l)/2 distance elements dij 
contained in the distance matrix D' as a function of time. 
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FIG. 3: Distribution of the [N — 1) distance elements dij con- 
tained in the asset (minimum spanning) tree T* as a function 
of time. 



We focus on characterizing the spread of nodes on 
the tree, by introducing the quantity of mean occupation 
layer 



l(t,v c 



1 



JV 

-5>vK 4 ), 

i=l 



(3) 



where lev(vi) denotes the level of vertex t>j. The levels, 
not to be confused with the distances dij between nodes, 
are measured in natural numbers in relation to the cen- 
tral vertex v c , whose level is taken to be zero. Here the 
mean occupation layer indicates the layer on which the 
mass of the tree, on average, is conceived to be located. 
The central vertex is considered to be the parent of all 
other nodes in the tree, and is also known as the root of 
the tree. It is used as the reference point in the tree, 
against which the locations of all other nodes are rela- 
tive. Thus all other nodes in the tree are children of the 
central vertex. Although there is an arbitrariness in the 
choice of the central vertex, we propose that it is central, 
in the sense that any change in its price strongly affects 
the course of events in the market on the whole. We have 
proposed three alternative definitions for the central ver- 
tex in our studies, all yielding similar and, in most cases, 
identical outcomes. The idea here is to find the node that 
is most strongly connected to its nearest neighbors. For 
example, according to one definition, the central node is 
the one with the highest vertex degree, i.e. the number of 
edges which are incident with (neighbor of) the vertex. 
Also, one may have either (i) static (fixed at all times) or 
(ii) dynamic (updated at each time step) central vertex, 
but again the results do not seem to vary significantly. 
We can then study the variation of the topological prop- 
erties and nature of the trees, with time. This type of 
visualization tool can sometimes provide deeper insight 
of the dynamical system. 



2. Economic taxonomy 



elements d^ contained in the asset (minimum spanning) 
tree T* (Fig. 0. In both plots, but most prominently in 
Fig. there appears to be a discontinuity in the distri- 
bution between roughly 1986 and 1990. The part that 
has been cut out, pushed to the left and made flatter, 
is a manifestation of Black Monday (October 19, 1987), 
and its length along the time axis is related to the choice 
of window width T 0,0. Also, note that in the dis- 
tribution of tree edges in Fig. |3] most edges included in 
the tree seem to come from the area to the right of the 
value 1.1 in Fig. [3 and the largest distance element is 
dmax = 1.3549. 



Mantegna's idea of linking stocks in an ultrametric 
space was motivated a posteriori by the property of such 
a space to provide a meaningful economic taxonomy. In 
7] , Mantegna examined the meaningfulness of the taxon- 
omy by comparing the grouping of stocks in the tree with 
a third party reference grouping of stocks by their indus- 
try etc. classifications. In this case, the reference was 
provided by ForbesQ, which uses its own classification 
system, assigning each stock with a sector (higher level) 
and industry (lower level) category. In order to visualize 
the grouping of stocks, we constructed a sample asset tree 
for a smaller dataset (shown in Fig. 0J) ll| , which con- 
sists of 116 S&P 500 stocks, extending from the beginning 
of 1982 to the end of 2000, resulting in a total of 4787 
price quotes per stock |ll|. The window width was set 
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FIG. 4: Snapshot of a dynamic asset tree connecting the 
examined 116 stocks of the S&P 500 index. The tree was 
produced using four-year window width and it is centered on 
January 1, 1998. Business sectors are indicated according to 
Forbes, http://www.forbes.com. In this tree, General Electric 
(GE) was used as the central vertex and eight layers can be 
identified. 

at T = 1000, and the shown sample tree is located time- 
wise at t = t* , corresponding to 1.1.1998. The stocks in 
this dataset fall into 12 sectors, which are Basic Materi- 
als, Capital Goods, Conglomerates, Consumer/Cyclical, 
Consumer/Non-Cyclical, Energy, Financial, Healthcare, 
Services, Technology, Transportation and Utilities. The 
sectors are indicated in the tree (see Fig.QJ with different 
markers, while the industry classifications are omitted 
for reasons of clarity. We use the term sector exclusively 
to refer to the given third party classification system of 
stocks. The term branch refers to a subset of the tree, 
to all the nodes that share the specified common parent. 
In addition to the parent, we need to have a reference 
point to indicate the generational direction (i.e. who is 
who's parent) in order for a branch to be well defined. 
Without this reference there is absolutely no way to de- 
termine where one branch ends and the other begins. In 
our case, the reference is the central node. There are 
some branches in the tree, in which most of the stocks 
belong to just one sector, indicating that the branch is 
fairly homogeneous with respect to business sectors. This 
finding is in accordance with those of Mantegna [7|, al- 
though there are branches that are fairly heterogeneous, 
such as the one extending directly downwards from the 
central vertex (see Fig. 0}. 

C. Portfolio analysis 

Next, we apply the above discussed concepts and mea- 
sures to the portfolio optimization problem, a basic prob- 
lem of financial analysis. This is done in the hope that 
the asset tree could serve as another type of quantita- 



tive approach to and/or visualization aid of the highly 
inter-connected market, thus acting as a tool support- 
ing the decision making process. We consider a gen- 
eral Markowitz portfolio P(t) with the asset weights 
wi, u>2 , ■ ■ ■ , wn . In the classic Markowitz portfolio op- 
timization scheme, financial assets are characterized by 
their average risk and return, where the risk associated 
with an asset is measured by the standard deviation of 
returns. The Markowitz optimization is usually carried 
out by using historical data. The aim is to optimize the 
asset weights so that the overall portfolio risk is mini- 
mized for a given portfolio return rp. In the dynamic 
asset tree framework, however, the task is to determine 
how the assets are located with respect to the central 
vertex. 

Let r m and denote the returns of the minimum and 
maximum return portfolios, respectively. The expected 
portfolio return varies between these two extremes, and 
can be expressed as rp.g = (1 — 6)r m + 9ru, where 9 is 
a fraction between and 1. Hence, when 9 = 0, we have 
the minimum risk portfolio, and when 6 = 1, we have the 
maximum return (maximum risk) portfolio. The higher 
the value of 9, the higher the expected portfolio return 
r-pfi and, consequently, the higher the risk the investor 
is willing to absorb. We define a single measure, the 
weighted portfolio layer as 

l P (t,9)= «*M«*)> ( 4 ) 

ieP(t,e) 

where y'., Wi — 1 and further, as a starting point, the 
constraint Wi > for all i, which is equivalent to assum- 
ing that there is no short-selling. The purpose of this 
constraint is to prevent negative values for lp(t), which 
would not have a meaningful interpretation in our frame- 
work of trees with central vertex. This restriction will 
shortly be discuss further. 

Fig. shows the behavior of the mean occupation 
layer l(t) and the weighted minimum risk portfolio layer 
lp(t,9 = 0). We find that the portfolio layer is higher 
than the mean layer at all times. The difference be- 
tween the layers depends on the window width, here set 
at T = 1000, and the type of central vertex used. The 
upper plot in Fig. is produced using the static cen- 
tral vertex (GE), and the difference in layers is found to 
be 1.47. The lower one is produced by using a dynamic 
central vertex, selected with the vertex degree criterion, 
in which case the difference of 1.39 is found. Here, we 
had assumed the no short-selling condition. However, it 
turns out that, in practice, the weighted portfolio layer 
never assumes negative values and the short-selling con- 
dition, in fact, is not necessary. Only minor differences 
are observed in the results between banning and allow- 
ing short-selling. Further, the difference in layers is also 
slightly larger for static than dynamic central vertex, al- 
though not by a significant amount. 

As the stocks of the minimum risk portfolio are found 
on the outskirts of the tree, we expect larger trees (higher 
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FIG. 5: Plot of the weighted minimum risk portfolio layer 
lp (t, 6 = 0) with no short-selling and mean occupation layer 
l(t,v c ) against time. Top: static central vertex, bottom: dy- 
namic central vertex according to the vertex degree criterion. 



L) to have greater diversification potential, i.e., the scope 
of the stock market to eliminate specific risk of the min- 
imum risk portfolio. In order to look at this, we cal- 
culated the mean-variance frontiers for the ensemble of 
477 stocks using T = 1000 as the window width. If we 
study the level of portfolio risk as a function of time, we 
find a similarity between the risk curve and the curves 
of the mean correlation coefficient p and normalized tree 
length L [6J. Earlier, when the smaller dataset of 116 
stocks - consisting primarily important industry giants - 
was used, we found Pearson's linear correlation between 
the risk and the mean correlation coefficient p(t) to be 
0.82, while that between the risk and the normalized tree 
length L(t) was —0.90. Therefore, for that dataset, the 
normalized tree length was able to explain the diversi- 
fication potential of the market better than the mean 
correlation coefficient. For the current set of 477 stocks, 
which includes also less influential companies, the Pear- 
son's linear and Spearman's rank-order correlation coef- 
ficients between the risk and the mean correlation coef- 
ficient are 0.86 and 0.77, and those between the risk and 
the normalized tree length are -0.78 and -0.65, respec- 
tively. 

Thus far, we have only examined the location of stocks 
in the minimum risk portfolio, for which 8 = 0. However, 
we note that as we increase 9 towards unity, portfolio risk 
as a function of time soon starts behaving very differently 
from the mean correlation coefficient and normalized tree 
length as shown in Fig. [f)| Consequently, it is no longer 
useful in describing diversification potential of the mar- 
ket. However, another interesting result is noteworthy: 
The average weighted portfolio layer lp(t,6) decreases 
for increasing values of 9. This implies that out of all the 
possible Markowitz portfolios, the minimum risk port- 
folio stocks are located furthest away from the central 
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FIG. 6: Plots of the weighted minimum risk portfolio layer 
lp(t, 9) for different values of 6. 

vertex, and as we move towards portfolios with higher 
expected return, the stocks included in these portfolios 
are located closer to the central vertex. It may be men- 
tioned that we have not included the weighted portfolio 
layer for 9 = 1, as it is not very informative. This is due 
to the fact that the maximum return portfolio comprises 
only one asset (the maximum return asset in the cur- 
rent time window) and, therefore, lp(t,9 = 1) fluctuates 
wildly as the maximum return asset changes over time. 

We believe these results to have potential for practi- 
cal application. Stocks included in low risk portfolios are 
consistently located further away from the central node 
than those included in high risk portfolios. Consequently, 
the radial distance of a node, i.e. its occupation layer, is 
meaningful. We conjecture that the location of a com- 
pany within the cluster reflects its position with regard 
to internal, or cluster specific, risk. Thus the characteri- 
zation of stocks by their branch, as well as their location 
within the branch, would enable us to identify the degree 
of interchangeability of different stocks in the portfolio. 
In most cases, we would be able to pick two stocks from 
different asset tree clusters, but from nearby layers, and 
interchange them in the portfolio without considerably 
altering the characteristics of the portfolio. Therefore, 
dynamic asset trees may facilitate incorporation of sub- 
jective judgment in the portfolio optimization problem. 

III. SUMMARY 

We have studied the dynamics of asset trees and ap- 
plied it to market taxonomy and portfolio analysis. We 
have noted that the tree evolves over time and the mean 
occupation layer fluctuates as a function of time, and 
experiences a downfall at the time of market crisis due 
to topological changes in the asset tree. For the portfo- 
lio analysis, it was found that the stocks included in the 
minimum risk portfolio tend to lie on the outskirts of the 



6 



asset tree: on average the weighted portfolio layer can 
be almost one and a half levels higher, or further away 
from the central vertex, than the mean occupation layer 
for window width of four years. Finally, the asset tree 
can be used as a visualization tool, and even though it is 
strongly pruned, it still retains all the essential informa- 
tion of the market (starting from the correlations in stock 
prices) and can be used to add subjective judgement to 
the portfolio optimization problem. 
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